C++ Library for Easy Command-Line Parsing by John M. Dlugosz I've always felt that the argv[] array was difficult to use. Not bad, just _primitive_. If all you have are a couple arguments, it is not too hard. But you still have to check for the correct count and convert each value to the proper type. If your program has various flags and switches, things can get much more difficult. How many programs have you written and suffered through the argument processing? In how many programs have you _wished_ you had a better way? In my case, I've written many simple programs that could benefit from command line arguments, but found it more trouble than it was worth. So I was stuck with a simpler, less flexible program. For test code and such, I would even change a value and recompile, instead of adding a nice command line processing. Now, I do have a simple way. It has revolutionized the way I write small programs. Rich command line argument processing, sign-on messages, and help on usage are now trivial. Here is an example. Consider a program that takes a `-v' switch for verbose mode. Using this library, this is accomplished by including the definition cmdl_flag v ('v', "requests verbose mode"); to make the program recognize the flag, and code such as if (v()) { //do this in verbose mode //whatever... } to respond to the state of this flag. There is no messy string manipulation, error checking, or anything. The library automatically handles `-v' or `/v' forms, disabling a switch with `-v-', cascading switches such as `-vbx', and other features. Notice the definition of `v' above takes two constructor arguments. The second argument is a string that provides usage information. The library will automatically generate the usage message, collecting the messages from all the parameters in the program. Concepts -------- The basic idea is to model command-line parameters as program arguments. That is, they should be analogous to arguments passed to a function. In a function call, each value passed is bound to a name in the called function. By analogy, a program argument is a name which gets bound to something which can be specified on the command line. To provide for command line input, you declare those arguments you want to receive, along with their types. The cmdl library has a type for each type of command line parameter: flags, integers, strings (more can be added). The constructor is given the name of the parameter, as used on the command line. It can also be given a help string, and flags. Here are some examples: typedef cmdl_flag flag; flag v ('v', "requests verbose mode"); flag s ('s', "specifies alternate algorithm"); flag T ('T', "prevents the foobar from clearing (debugging)" ,cmdl::once); cmdl_string pos1 ((char*)0, "first positional parameter", cmdl::required); cmdl_string pos2 ((char*)0, "second positional parameter"); cmdl_string pos3 ((char*)0, "third positional parameter"); cmdl_int count ('c', "iteration count"); cmdl_help helper; This shows the following types: * Type cmdl_flag is a simple switch. Using that flag makes the parameter TRUE, if absent it is FALSE. You can also turn off the switch by using the name with a trailing `-' sign. (The library takes care of cascading switches, too.) * Type cmdl_string allows input of an arbitrary string. The syntax is somewhat flexible, with the argument separated from the keyword by a space or an `=', and the string can be in quotes. * Type cmdl_int allows input of an integer. The input is checked for valid syntax. * Type cmdl_help provides for an automatically generated help screen if the command line is empty, or with the `-?' switch. Except for the special cmdl_help class, the constructors take two or three arguments. The first is the name of the command-line parameter. This can be given as a single char or as a string. If passed `(char*)0', there will be no name and it is taken to be a positional parameter, explained later. The second constructor argument is the usage help string. The optional third argument to the constructors is a bank of flags. `once' indicates that the argument can only appear once in the command line. Ordinarily, repeating it will override the previous mention. The `required' flag means that it is an error to omit the parameter. There are others, detailed in the code listing. A flag worth particular attention is `keyword'. If present, then the command-line parameter name will not use the switchchar ('-' or '/') to indicate that this is a parameter. If a keyword is found anyplace outside of a quoted string it will be used as an instance of the parameter. Using class cmdl in a program ----------------------------- The program that contains these definitions will kick off everything by calling `cmdl::parseit();'. This works because the constructor for each command-line argument class linked them together into a linked list. The command-line argument objects should be global, or defined in `main' before calling `parseit()'. In any case, no commnand-line object should ever go out of scope before `parseit()' is called. Because the objects link themselves up, the complete collection of defined command line parameters is known. `parseit()' will parse the command line, and compare what it finds with the list of possible arguments. It takes care of usage errors and such, so the program aborts if the command line is invalid. No error checking is required by the main program. Each command-line-parameter object contains an `operator()' which provides a succinct way to get the value of that parameter. There is a default value in case it was not specified on the command line. If you would rather check for its presence, use the `hasvalue()' member. Before calling `parseit()', you can use the static member `signon()' to note a string used during the usage help message. See the listing of TEST3.CPP and other files for usage of the examples described above. Kinds of argument names: char, string, and positional ----------------------------------------------------- The first argument to the constructor of the command-line parameter objects is the name of the parameter that will be used on the command line. The constructor has two forms. It can take a char, used for a single letter switch. Or it can take a string (char*), for arbitrary names. In addition, the string form responds to a special name of NULL. Passing in `(char*)0' for the name makes it a positional parameter. The parser will not assign it based on a name. Instead, it is used for unnamed parameters. If a parameter does not start with a '-' or '/', and it does not match the name of a keyword parameter (those that don't use the '-'), it is taken to be a positional parameter. It is assigned to the first unused positional parameter you defined. This lets you mix switches with non-named parameters such as filenames. Note that positional parameters can be flagged as `required'. The Use of C++ -------------- A few C++ language concepts may need explaining. Note the syntax of the flags in the third constructor argument. cmdl::required | cmdl::keyword The names here are enumeration constants. They are created with an `enum' definition (see CMDL.H, line 50). The names are defined within the class, and are in the scope of the class. They are not global, and don't pollute the global namespace. So, you have no conflict with a name `keyword' used elsewhere in the program, for example. The downside of this is that you qualify the name with its classname, as shown. Note that in C, you probably would have seen `CMDL_KEYWORD' instead--- the name would contain its "family" identifier as part of itself. So it really is not additional typing to use class-scoped names like these. The enumeration constants are given explicit values as powers of 2, so they behave as flags which can be combined with | or +. The function's parameter taking the flags are defined as unsigned, not as an enum type (in fact, the enum type has no name. It just defines the constants). This is necessary because the result of | or + is an int, not an enum type. The class contains two definitions of enum names for flags that share the same flags variables. But some are public and some are protected. Another interesting feature is the use of `operator()'. See CMDL.H lines 84, 97, and 109. The operator is defined with the name `operator()' which is then followed by the parameter list. Here it has no parameters, so you see two sets of ()'s in a row. The operator is invoked by following the object name with the parameter list, as shown in the test programs. The positional parameter ability requires you to pass `(char*)0' instead of just 0 because 0 is ambiguous--- 0 can be a char '\0' or a null pointer. The Parser ---------- The core parser code breaks up the command line into tokens and looks up names of parameters. The value of parameters is sent to the matched object for conversion to the proper type. The virtual `scan()' function does this final part. An earlier version of this library had a seemingly more flexible system that allowed significant customization in the specific parameter type's code. However, it proved too clumsy and was never really used. This points out a good design philosophy: Make a thing just flexible enough. If it is too configurable, it can become as difficult, or more so, to use as writing code each time; which is exactly what the library is supposed to avoid. The parser uses a class cmdlscan for low level character manipulation and tokenizing. It is planned to give this more power in the future, for better error reporting. Some of the implementation details are implemented as they are for that reason. The need for a parser class was indicated because several related values, including the string and its current scan position, were always being passed together. When things like this happen, think about combining them into an object. Error and Report Output ----------------------- I did not want the code to simply use `cerr' and `cout' for output. This may be used in programs that have their own idea of I/O, including programs that run in graphics mode. For maximum flexibility, all output is separated. The final results are funneled through a pair of functions called `cmdl::output()', both defined in OUTPUT.CPP. If linking in CMDL.LIB, you can supply your own versions of these two functions to handle output your way, without having to recompile the cmdl library.